Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation

نویسندگان

  • Anne Lacheret
  • Nicolas Obin
  • Mathieu Avanzi
چکیده

In the area of large French speech corpora, there is a demonstrated need for a common prosodic notation system allowing for easy data exchange, comparison, and automatic annotation. The major questions are: (1) how to develop a single simple scheme of prosodic transcription which could form the basis of guidelines for non-expert manual annotation (NEMA), used for linguistic teaching and research; (2) based on this NEMA, how to establish reference prosodic corpora (RPC) for different discourse genres (Cresti and Moneglia, 2005); (3) how to use the RPC to develop corpus-based learning methods for automatic prosodic labelling in spontaneous speech (Buhman et al., 2002; Tamburini and Caini 2005, Avanzi, et al. 2010). This paper presents two pilot experiments conducted with a consortium of 15 French experts in prosody in order to provide a prosodic transcription framework (transcription methodology and transcription reliability measures) and to establish reference prosodic corpora in French.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosody in a corpus of French spontaneous speech: perception, annotation and prosody ~ syntax interaction

Our study focuses on the issue of prosodic annotation and of the prosody ~ syntax interface in conversation and is based on a large corpus of conversational speech in French. The results of inter-transcriber agreement tests show that two expert transcribers are consistent in their labeling of prosodic phrasing and the consistency is well above the chance. A qualitative analysis reveals transcri...

متن کامل

Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners

We explore the use of machine learning techniques (notably SVM classifiers and Conditional Random Fields) to automate the prosodic labelling of French speech, based on modelling and simulating the perception of prosodic events by naı̈ve and expert listeners. The models are based on previous work on the perception of syllabic prominence and hesitation-related disfluencies, and on an experiment on...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

An automatic intonation recognizer for the Polish language based on machine learning and expert knowledge

In the paper a new automatic intonation recognizer for the Polish language is presented. The recognizer design combines Machine Learning and expert knowledge techniques. Machine Learning is used in pitch stylization (Artificial Neural Network), speech alignment (external design based on Hidden Markov Model) and intonation decoding (Hidden Markov Model). Expert knowledge drives phonemization, sy...

متن کامل

Naïve listeners’ perception of prominence and boundary in French spontaneous speech

Our main goal here is to explore the link between naïve listeners’ perception of prominences and boundaries in spontaneous speech and experts’ annotation of prosodic hierarchy and accentuation in French. We first present the design of our corpus, which consists in 133 utterances extracted from the Corpus of Interactional Data (CID). 73 naïve listeners judged prominences and boundaries using thr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010